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Abstract 

Over the past thirty years, there has been consid- 
erable progress in the design of natural language 
interfaces to databases. Most of this work has con- 
cerned snapshot databases, in which there are only 
limited facilities for manipulating time-varying in- 
formation. The database community is becoming 
increasingly interested in temporal databases, data- 
bases with special support for time- dependent en- 
tries. We have developed a framework for con- 
structing natural language interfaces to temporal 
databases, drawing on research on temporal phe- 
nomena within logic and linguistics. The central 
part of our framework is a logic-like formal lan- 
guage, called TOP, which can capture the semantics 
of a wide range of English sentences. We have im- 
plemented an HPSG-based sentence analyser that 
converts a large set of English queries involving 
time into TOP formulae, and have formulated a 
provably correct procedure for translating TOP ex- 
pressions into queries in the TSQL2 temporal data- 
base language. In this way we have established a 
sound route from English to a general-purpose tem- 
poral database language. 
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1 Background 

Time is an important research topic in both linguis- 
tics (tense and aspect theories; see ||, |[| for an 
introduction), and logic (temporal logics; see p9|). 
Computer scientists are also becoming increasingly 
interested in temporal databases, databases that 
are intended to store not only present but also 
past and future facts, and that generally provide 
special support for the notion of time [[l6| |28| . Al- 
though interesting ideas have emerged in all three 
time-related disciplines, these ideas have remained 
largely unexploited in the area of natural language 
interfaces to databases (Nlidbs; see ^3|, [O, and 
HI for an introduction to Nlidbs). Most Nlidbs 
cannot answer questions involving time, because: 
(a) they cannot cope with the semantics of natu- 
ral language temporal expressions (e.g. verb tenses, 
temporal adverbials), and (b) they were designed 
to interface to "snapshot" database systems, that 
provide no special support for the notion of time. 

Previous research on Nlidbs for temporal data- 
bases has ignored important temporal linguistic phe- 
nomena, used not fully defined meaning representa- 
tion languages, or assumed ad hoc temporal data- 
base models and languages. Clifford 0, for ex- 
ample, has defined formally a temporal version of 
the relational database model, and a fragment of 



English that can be used to query databases struc- 
tured according to his database model. Clifford's 
approach is interesting in that both the seman- 
tics of the English fragment and of the temporal 
database model are defined within a Montague se- 
mantics framework |15| . However, Clifford's cover- 
age of English is extremely narrow, and the seman- 
tics of the English mechanisms for expressing time 
are oversimplified. For example, perfect and con- 
tinuous tenses are not supported, and no distinc- 
tion between states, events, culminated activities, 
and points (section ^ below) is made. Furthermore, 
there is no indication that the overall theory has 
ever been used to implement an actual Nlidb. 

De et al. 1 13 also support only an extremely lim- 
ited subset of English temporal mechanisms, and 
the underlying "temporal database" looks more like 
a collection of if-then-else rules than a principled 
temporal database system. In the Cle system B, 
verb tenses introduce temporal operators (section^ 
below) and event /state variables into the generated 
logical expressions. The semantics of these opera- 
tors and the semantics of the event/state variables, 
however, are left undefined. 

Past work on Nlidbs has shown the benefits 
of using a principled intermediate representation 
language (typically, some form of logic) to encode 
the meanings of natural language queries, with the 
resulting intermediate language expressions being 
available for translation into a suitable database 
language (e.g. SQL p]]]). Similar advantages (such 
as generality, modularity and portability; see sec- 
tions 5.4 and 6 of (4|) accrue from developing tem- 
poral variants of this architecture. We have devel- 
oped a formal language, called Top, to serve as 
the intermediate representation language in place 
of conventional (non-temporal) logics. A tempo- 
ral extension of SQL, called Tsql2 ^7|, was also 
proposed recently. Our architecture (in direct re- 
flection of existing Nlidbs) has an English query 
parsed into a syntactic structure and converted into 
a Top expression encoding the relevant aspects of 
its meaning. This is then translated into a Tsql2 
query, and the evaluation of this query against the 
temporal database supplies the answer to the orig- 
inal English query. 

More specifically, we have addressed the follow- 
ing issues: (a) design and implementation of a non- 
trivial English grammar handling temporal phe- 
nomena; (b) design of the Top language, including 
the definition of a precise model-theoretic seman- 
tics for it; (c) devising a systematic conversion from 
English syntactic form to Top formulae; (d) defin- 
ing translation rules from Top formulae to Tsql2 
queries, and proving the correctness of the trans- 
lation rules; (e) implementing all the above. The 
full details of Top and the translation to TSQL2 
are highly formal and rather voluminous, so such 



technical details are beyond the scope of this paper. 
Here we concentrate on giving an overview of the 
work and the motivation for some of the directions 
we have followed. 

Section^ below surveys, from the perspective of 
Nlidbs, some of the linguistic phenomena relating 
to temporal information. This discussion demon- 
strates that there are real linguistic issues involved 
in providing correct replies to English queries di- 
rected to a temporal database. Section || outlines 
Top, showing how it captures important seman- 
tic distinctions that occur within English temporal 
queries. Section ^ sketches how English sentences 
can be converted systematically to Top expres- 
sions, and section ^| summarises the salient features 
of the translation from Top to Tsql2. We con- 
clude with some remarks about the direction which 
this kind of research could take in the future. 

2 The linguistic data 

There is a wealth of mechanisms for expressing time 
in English (and most natural languages). Tempo- 
ral information can be conveyed by verb tenses, 
nouns ("day", "beginning"), adjectives ("earliest", 
"annual"), adverbs ("yesterday", "twice"), prepo- 
sitional phrases ("at 5:00pm", "for two hours"), 
and subordinate clauses ( "while gate 2 was open"), 
to mention just some of the temporal mechanisms. 
It is well-known that the semantics of English tem- 
poral expressions cannot be modelled adequately 
in the absence of some classification of verbs in 
terms of the situations described by the verbs. (We 
use "situation" to refer collectively to what other 
authors call "event" , "state" , "action" , "process" , 
etc.) Most of the classifications that have been 
proposed originate from Vendler's taxonomy f^pf . 
We use a version of Vendler's taxonomy, whereby 
verbs are divided into: state verbs, activity verbs, 
culminated activity verbs, and point verbs. 

Roughly speaking, state verbs describe a prop- 
erty without referring to an action or a change 
in the world. For example, "to contain" and "to 
border", as in "Tank 2 contains oil." and "Greece 
borders Bulgaria." , are state verbs. Activity verbs, 
in contrast, refer to actions or changes in the world. 
"To run" and "to advertise", as in "John ran." 
and "IBI advertised a new computer." , are exam- 
ples of activity verbs. Culminated activity verbs 
are similar to activity verbs, in that they describe 
world changes or actions. They differ, however, 
from activity verbs in that the situations they de- 
scribe have an inherent climax, a point that has to 
be reached for the action/change to be considered 
complete. "To fix (an engine)" and "to build (a 
bridge)", as in "Engineer 1 fixed engine 2." and 
"Housecorp built a bridge.", are culminated activ- 
ity verbs. The climax of the fixing is the point 
where the repair of the engine is finished, and the 



climax of the building is the point where the con- 
struction of the bridge is completed. In contrast, 
the situations described by "to run" and "to ad- 
vertise" in "John ran." and "IBI advertised a new 
computer." do not seem to have inherent climaxes. 
Finally, point verbs describe situations that are 
perceived as instantaneous. "To explode", as in 
"A bomb exploded." , is a point verb. 

The class of a verb may depend on the syntactic 
complements of the verb (e.g. its object). For ex- 
ample, "to run" with no object (as in "John ran.") 
is an activity verb, but "to run" with an object de- 
noting a specific distance (as in "John ran a mile.") 
is a culminated activity verb (the climax is the 
point where John completes the mile). Aspectual 
markers (e.g. the progressive aspect) may cause a 
verb to be moved from its normal class to another 
one (this will be discussed below). 

The distinction between activity and culminated 
activity verbs can be used to account for the so- 
called "imperfective paradox" [Q [l9| . 

(1) Was IBI ever advertising a new computer? 

(2) Did IBI ever advertise a new computer? 

(3) Was engineer 1 ever fixing engine 2? 

(4) Did engineer 1 ever fix engine 2? 

If the Nlidb's answer to ([!]) is affirmative, then 
the answer to (|^) must also be affirmative. In 
contrast, if the answer to (^) is affirmative, this 
docs not necessarily imply that the answer to (Q) 
will also be affirmative (engineer I may have aban- 
doned the repair before completing it; we classify 
"to advertise" as an activity verb, while "to fix 
(an engine)" as a culminated activity verb). In 
the case of culminated activity verbs, the simple 
past ("did fix") requires the climax to have been 
reached (i.e. the repair must have been completed). 
In contrast, the past continuous of culminated ac- 
tivity verbs ( "was hxing") makes no claim that the 
climax was reached. Hence, an affirmative answer 
to (H) does not imply an affirmative answer to (||) 
(though an affirmative answer to ^ implies an 
affirmative answer to (||)). In the case of activity 
verbs, there is no climax, and neither the simple 
past nor the past continuous make any claim that 
a climax was reached. Hence, an affirmative answer 
to (0) implies an affirmative answer to @ (and vice 
versa) . 

The need for a classification of verbs is also 
apparent when verbs combine with temporal ad- 
verbials (see also the linguistic data of (f^]). When 
state verbs combine with adverbials understood as 
specifying time points, the situation of the verb 
must usually simply hold at the point of the adver- 
bial. For example, in ([5]) any tank that contained 
oil at 5:00pm must be reported. There is no re- 
quirement that 5:00pm must have been the point 
at which the tank started or stopped containing oil. 



(5) Which tanks contained oil at 5:00pm? 

(6) Which athlete ran at 5:00pm? 

(7) Who fixed an engine at 5:00pm? 

(8) Which station broadcast the President's 
message at 5:00pm? 

In contrast, in the case of activity verbs, the "at" 
point is usually understood as the time at which 
the activity started. For example, in (^j), the most 
natural reading is that the athlete started to run 
at 5:00pm. (In the progressive "Which athlete was 
running at 5:00pm." , however, the adverbial does 
not have an inchoative meaning. This will be dis- 
cussed below.) Finally, with culminated activity 
verbs (in non-progressive forms), the "at" point is 
usually the time at which the climax was reached, 
or in some cases the point where the change/action 
described by the verb started. In ([?]), for example, 
5:00pm is probably the time at which the repair was 
completed. The inchoative meaning with culmi- 
nated activity verbs is easier to accept in (|J) , where 
5:00pm is probably the point where the broadcast- 
ing started. (We classify "to broadcast (a mes- 
sage)" as a culminated activity verb, with the cli- 
max being the point where the broadcasting of the 
message is completed.) 

Verb aspects also play an important role. In 
(^|), the most natural reading is that 5:00pm must 
simply have been a point where the running was 
ongoing. There is no implication that the run- 
ning must have started at 5:00pm. (The futurate 
meanings of progressive tenses - e.g. the athlete in 
(||) was going to run at 5:00pm, but perhaps never 
ran - are ignored in this project.) Compare @ to 
(||), where 5:00pm is probably the time where the 
running started. 

(9) Which athlete was running at 5:00pm? 

(10) Who was fixing an engine at 5:00pm? 

(11) Which station was broadcasting the 
President's message at 5:00pm? 

In other words, although "to run" is an activity 
verb, in ^ it behaves as if it were a state verb. 
(With state verbs, the adverbial's point is simply 
a point where the situation was true.) Similar 
observations can be made for (ftf]) and ([n|) (cf. (g) 
and (||)). We account for @-([y]) by assuming that 
the progressive verb aspect transforms activity and 
culminated activity verbs into state verbs. (This 
is similar to Moens' view |2^] that the progressive 
coerces "processes" into states.) 

A cancelling transformation (see section [| be- 
low) takes place when culminated activity verbs 
combine with "for" adverbials. This transforma- 
tion cancels the normal implication that the climax 
has been reached. For example, ( |l2] ) implies that 
the climax has been reached. In contrast, ( |l3|) 
carries no such implication. 



(12) Housecorp built bridge 2. 

(13) ?Housecorp built bridge 2 for two years. 

(14) Housecorp was building bridge 2 for two years. 

(15) *John fixed fault 2 for two hours. 

(16) John was fixing fault 2 for two hours. 

Some native speakers find ([l3]) unacceptable, and 
( |i~5| ) is unacceptable to most native speakers. It 
seems, however, a reasonable simplification to as- 
sume that a Nlidb could treat (^) and (TH]) as 
grammatical, and equivalent to (|14 ) and ( |l6[ ) re- 
spectively. (In (0) and © there is no implication 
that a climax was reached.) 

We note at this point that we have focused our 
work on stand-alone questions. We have not ex- 
amined discourse-related phenomena ||l7f . We have 
also restricted our work to questions about the past 
and the present. We have not examined questions 
referring to the future. 

3 Modelling time in TOP 

This section provides an overview of Top, the for- 
mal language we use to represent the meanings of 
the English questions. Top assumes that time is 
linear, discrete, and bounded ^9|, and expresses 
temporal information using operators (Top stands 
for "language with Temporal OPerators"). For ex- 
ample, ( |i~7| ) would be expressed in Top as (|l8|): 

(17) Did tank 2 (ever) contain water? 

(18) Past[contain(tank2, water)] 

where Past is a temporal operator, which roughly 
speaking requires contain(tank2, water) to be true 
at some past time. The answer to ( |i~7| ) is affirmative 
if and only if (|l^) evaluates to true. 

Top's temporal operators have been influenced 
by those of fj^ ]. An alternative operator-less ap- 
proach would be to introduce time as an extra ar- 
gument of each predicate. In this case, 

(0) would 

be expressed as: 

(19) 3t' contain(tank2, water, t') At' < now 

where < denotes temporal precedence. (In this 
and following sections primed strings are used as 
variables.) We use temporal operators mainly be- 
cause they lead to more compact formulae. We 
make no claim regarding the expressivity of Top 
and other operator-based languages vs. operator- 
less languages. 

Speech, event, and localisation time 

Top formulae are evaluated with respect to three 
parameters: speech time (st), event time (et), and 
localisation time (It). The first two are as in Re- 
ichenbach's work p5|] . st is the time point where 
the question is submitted to the Nlidb. et is, 
roughly speaking, an interval corresponding to the 



time where the situation represented by the for- 
mula takes place. The third parameter, It, derives 
from the logic of (It has nothing to do with 

Reichenbach's "reference time".) It is an interval 
acting as a temporal window within which et must 
be located. 

To understand how the three parameters work, 
let us consider the reading of ( p0| ) that asks if John 
was running some time on 1/6/94. The correspond- 
ing Top formula is (^lj). 

(20) Did John run on 1/6/94? 

(21) Af[l/6/94, Past[run(john)]] 

( [H|) is evaluated as follows. First, st is fixed to 
the point where ( po|) was submitted to the Nlidb. 
Initially, It covers the whole time-axis, and et can 
be any interval. Next, the At operator narrows the 
localisation time window, so that it only covers the 
day 1/6/94. Thus, et now has to be a subinterval 
of 1/6/94. The Past operator (introduced by the 
verb tense) requires It to be narrowed, so that it 
only contains time points that precede st. (If the 
question is submitted after 1/6/94, the Past does 
not narrow It any further.) ( pl| ) evaluates to true, 
if and only if it is possible to find an et where 
run(john) is true (i.e. John was running through- 
out that interval), such that et is a subinterval of 
It (i.e. a subinterval of 1/6/94, if the question is 
submitted after 1/6/94). 

Homogeneity 

Top atomic formulae (predicates) always satisfy 
the following homogeneity restriction: if an atomic 
formula (e.g. contain s(tank2, water)) is true at an 
event time et\, then it is also true at any event 
time eti that is a subinterval of et\. Non-atomic 
Top formulae do not have to satisfy this restriction. 
(Various versions of homogeneity have been used in 
§, H|, H, and elsewhere.) 

Progressives 

The progressives of activity and point verbs are 
expressed using the same predicates that express 
the corresponding non-progressive forms. For ex- 
ample, the reading of (|2|) that asks if John was 
running at some time on 1 /6 /94 is expressed using 
( plTj , the same Top formula that expresses the non- 
progressive (po|). 

(22) Was John running on 1/6/94? 

Progressives of culminated activity verbs are ex- 
pressed in a similar manner. For example, the 
reading of ( p3f ) that asks if John was fixing engine 
2 some time on 1/6/94 is expressed as (p4[). 

(23) Was John fixing engine 2 on 1/6/94? 

(24) At[l/6/94 , Past[fixing(john, eng2)]} 



State verbs typically do not appear in progressive 
forms (e.g. "Tank 2 was containing water." sounds 
odd). 

Non-progressives of culminated activity verbs 

Non-progressive forms of culminated activity verbs 
are expressed using the Culm operator and the 
predicates that correspond to the progressive forms. 
For example, (25) is expressed as (26). 



(25) Did John fix engine 2 on 1/6/94? 

(26) At[l/6/94 , Past[Culm[fixing(john, eng2)}]} 

In (p6|), the semantics of the Culm operator re- 
quires et to cover a maximal interval where the 
predicate fixing(john, eng2) is true, the end-point 
of et to be a point where the repair reaches its 
climax, and et to be a subinterval of It. Assum- 
ing that ( p5| ) is submitted after 1/6/94, when the 
expression Culm[fixing{john,eng2)\ is evaluated, 
It is the interval that covers exactly the day 1/6/94. 
The answer to ( p5[ ) will be affirmative if and only 
if for some event time interval et, et covers exactly 
a repair (from start to completion) of engine 2 by 
John, and et is a subinterval of 1/6/94. 

( p6| ) captures the reading of ( p5| ) whereby a re- 
pair of engine 2 by John must have both started 
and been completed within 1/6/94. Under an al- 
ternative reading, it is enough if the repair simply 
reached its climax on 1/6/94. In this case, the re- 
pair may have started, for example, the day before. 
This reading is captured by (27). 

(27) At[l/6/94, 
Past[End[Culm[fixing(john, eng2)]]]] 

According to (p7|), it is enough if the end-point of 
an interval that covers exactly a repair from start 
to completion falls within 1/6/94. 

An affirmative answer to ( p3|) (expressed as (|24|)) 
does not necessarily imply an affirmative answer 
to (|||) (expressed as (Pq))- If, for example, John 
was fixing engine 2 some time on 1/6/94, but never 
completed the repair, then there will be an interval 
within 1/6/94 at which fixing(john, eng2) is true, 
but there will be no interval at which the expression 
Culm[fixing{john,eng2)] is true, because at no 
point did the repair reach its climax. Hence, the 
answer to ( p4j ) will be affirmative, but the answer 
to ( p6| ) will be negative. This accords with the 
imperfective paradox of section |[ (It should also 
be easy to see that an affirmative answer to ( p6| ) 
implies an affirmative answer to (p4[).) 

Wh-questions 

So far, we have considered only yes/no questions. 
Questions like ( p8| ) are expressed using the inter- 
rogative quantifier ?, as shown in (p9|). 

(28) What did John fix? 

(29) lx' Past[Culm[fixing(john,x')]] 



Figure 1: The Perf operator 



29| ) says that the answer should contain any x' , 
such that John completed the fixing of x' in the 
past. 

The Past operator actually has a slightly more 
complex form than the one we have been using up 
to this point: it is indexed by a variable (e' in 
the following example). The Top formula for ( p8|) 
would actually be (|30|). 

(30) Ix 1 Past[e' , Culm[fixing(john,x')]] 

The semantics of Top binds e' to et. The e' vari- 
able of Past is useful in time-asking questions like 
(|3l|), expressed as (32). 



(31) When did tank 2 contain water? 

(32) ? mx ie' Past[e' ,contain(tank2, water)] 



32|) reports the maximal intervals among the past 
intervals at which tank 2 contained water. 

Perfective aspect 

The perfective aspect is expressed using a special 
Perf operator. Ignoring some details, Perf[e' 2 , is 
true with respect to a speech time st, an event time 
etx, and a localisation time lt\, if and only if (see 
figure [l]): (a) et\ is a subinterval of Iti, (b) there 
is an et 2 that ends before eii, and (c) (f> is true 
with respect to st, et 2 , and lt 2 , where Iti covers 
the entire time axis (i.e. It is reset to the whole 
time axis when evaluating (f>). e2' is similar to the 
indexing variable of Past: e2' is always bound to 
et 2 . Intuitively, Perf[e' 2 ,<j)] is true at event time 
intervals that are preceded by other event time 
intervals where <h is true. To illustrate the use of 



Perf , let us consider (33) 



(33) Had IBI advertised PPC on 1/1/85? 

(34) Did IBI advertise PPC on 1/1/85? 



33|) has two readings. Under the first reading, 
it asks if IBI advertised PPC on 1/1/85 (remote 
past meaning). In this case, ( |33| ) is similar to (|34j). 
Under a second reading, ([33]) asks if IBI had ever 
advertised PPC at any time up to (and possibly 
including) 1/1/85. Under the latter reading, if IBI 
advertised PPC only on 6/6/84, the answer to ([33]) 
would still be affirmative. The two readings are 
captured by (|3|) and ( |36| ) respectively (our system 
generates both). 

(35) Past [ei , Perf [e' 2 , At [1/1/85, advertise(ibi,ppc)]]] 

(36) At[l/l/85, Past[e[,Perf[e' 2 ,advertise(iH,ppc)]]] 




an engine 



Figure 2: Parse tree for "John fixed an engine on 
1/6/94." 



Intuitively, ( p5[ ) says that there must be a past 
event time interval e[ — et\, that is preceded by 
another event time interval e' 2 — eti, such that e' 2 
falls within 1/1/85, and advertise(ibi,ppc) is true 
at e' 2 . In other words, the advertising takes place on 
1/1/85. In contrast, ( {36[ ) says that there must be a 
past event time interval e\ = et\, that falls within 
1/1/85, and that is preceded by another event time 
interval e' 2 — et 2 where advertise(ibi,ppc) is true. 
In this case, the advertising does not necessarily 
take place on 1/1/85. 

We should point out that we have examined 
only the following tenses: simple present, simple 
past, present continuous, past continuous, present 
perfect, and past perfect. We have not examined 
how other tenses could be expressed in Top. Also, 
we have specified how to express in Top tempo- 
ral subordinate clauses introduced by only "while", 
"before", and "after" (e.g. we have not considered 
clauses introduced by "when" or "since"). Finally, 
we have not examined how to express in Top tem- 
poral adjectives (e.g. "first", "annual"), nouns in- 
troducing events (e.g. "the construction of bridge 
2"), order nouns (e.g. "predecessor"), or frequency 
adverbials (e.g. "twice"). 

4 From English to TOP 

The English questions are parsed and mapped to 
Top expressions using an HpSG-based grammar 
p4j . The grammar was developed using Ale Q, 
and it is based on previous Ale encodings of Hpsg 
fragments by Penn, Carpenter, Manandhar, and 
Grover. Our grammar is very close to the Hpsg 
version of chapter 9 of p4[ , with the main exception 
being that the situation theoretic semantic con- 
structs of [^4] have been replaced by feature struc- 
tures that represent Top expressions. A detailed 
description of our grammar is outside the scope of 
this paper (see ||). Here we will only attempt to 
offer a flavour of how the grammar works. 

Let us consider (|37|), which our experimental 
system treats as a yes/no question. 

(37) John fixed an engine on 1/6/94. 



Figure |2| shows the parse tree for (37). Arcs marked 
with HE, SU, CO, and AJ correspond to head, sub- 
ject, complement, and adjunct daughters respec- 
tively. The lexical head (the verb "fixed") first 
combines with its complement (the noun phrase 
"an engine"). The resulting verb phrase combines 
with its subject ("John"), producing a sentence. 
The prepositional phrase "on 1/6/94" attaches to 
this sentence as an adjunct. In our grammar, tem- 
poral adjuncts like "on 1/6/94" or "yesterday" are 
taken to modify full sentences (verbs that have com- 
bined with their subjects and complements). There 
is only one case where our grammar allows tem- 
poral adjuncts to modify verb phrases (verbs that 
have combined with their complements but not their 
subjects), and this is in the case of past participles 
(e.g. "given"). Unlike all other verb forms, we 
allow past participles to be modified by temporal 
adverbials either before or after combining with 
their subjects. This is needed to be able to generate 
both readings of (|33|). 

In Hpsg, the order in which the daughters of a 
node appear in the surface sentence is determined 
by the Constituent Ordering Principle (Cop). Our 
version of Cop places no restriction on the order 
between temporal adjuncts like "on 1/6/94" and 
the head daughters that the adjuncts modify. Hence, 
"on 1/6/94" can either follow the "John assembled 
an engine" as in ([37]), or it can precede it as in (|38j). 
In either case, the Top formula would be the same, 
i.e. ©. 

(38) On 1/6/94 John fixed an engine. 

(39) 3x' engine(x') A 

^[1/6/94, Past[e', Culm[fixing(john,x')]]] 

Let us now examine how (37) (or ((H)) is mapped 
to (39). A lexicon entry associates the past tense 
form "fixed" with the expression in (ftO|).F| ("__" 
denotes an empty slot.) 

(40) Past[e' , Culm[fixing(_, _)]] 

The noun phrase "an engine" receives the ex- 
pression shown in (j4l|). (The existential quantifier 
derives from the determiner "a", and the engine(x') 
derives from the lexical entry for "engine" .) 

(41) 3a;' engine(x') 

When "fixed" combines with "an engine", ([ll]) 
enters a quantifier store JTcj ] , and x' , the variable 
used in (|4l]), fills the second argument-slot of the 
fixing(—,—) in (f40|). The verb phrase "fixed an 
engine" inherits the semantics of its head daugh- 
ter ("fixed"), but now the second argument of the 
predicate fixing(—,—) is x': 

1 In our system, the person that configures the lexicon 
needs to provide lexicon entries for only the base forms of 
verbs. Lexicon entries for non-base verb forms are generated 
automatically by lexical rules. 



(42) Past[e', Culm[fixing(__, x')]] 

Ignoring some details, when the verb phrase 
combines with its subject, the constant correspond- 
ing to "John" fills the remaining empty slot of the 
predicate fixing(—, x'), and the mother of the verb 
phrase inherits the semantics of its head daughter, 
which is now: 

(43) Past[e , Culm[fixing(john,x')]] 

Finally, "on 1/6/94" is mapped to @. When 
"on 1/6/94" combines with "John assembled an 
engine", the empty slot of (|44|) is filled by the ex- 
pression of the head daughter, i.e. Thus, ( O ) 
becomes (45). 



(44) At[l/6/94, __] 

(45) ^[1/6/94, Past[e', Culm[fixing{john, x')]] 



is then generated by "unstoring" the con- 
tents of the quantifier store in front of i.e. 
by adding ( [fl]) in front of (^H|). In our experimen- 
tal system the unstoring operation is quite primi- 
tive. The contents of the quantifier store are simply 
added in front of the matrix expression, preserving 
the order in which the quantifiers appear in the 
sentence. More elaborate unstoring techniques are 
possible (see chapter 8 of 

In a similar way, ( ji^ ) is mapped to (|7|). In 
this lexicon entry associates the present 

participle "Gxing" with fixing(__,_J). The Past 
operator in (^) is added by the auxiliary "was". 

(46) John was fixing an engine on 1/6/94. 

(47) 3x' engine(x') A 

At[l/6/94, Past[e', fixing(john, x')]} 

The transformation that cancels the implication 
of culminated activity verbs that the climax has 
been reached when a "for" adverbial is present (sec- 
tion |) has been implemented as a post-processing 
rule. @ is initially mapped to @. (The For 
operator specifies the duration of the event time.) 
The post-processing rule then removes any Culm 
operator from the interior of a For operator that 
has been introduced by a "for ..." adverbial, re- 
sulting in (|5l]), the same formula that expresses 
©■ 

(48) Housecorp built bridge 2 for two years. 

(49) For[year,2, 

Past[e' , Culm[building(housecorp, bridge2)]]] 

(50) Housecorp was building bridge 2 for two years. 

(51) For[year, 2, 

Past[e' , building(housecorp, bridge2)]] 

Interrogatives like "who", "what", or "which 
engine" axe treated like normal noun phrases (e.g. 
"an engineer"), except that they insert interroga- 
tive quantifiers into the quantifier store. For exam- 
ple, "which engineer" would cause lx' engineer(x / ) 
to be inserted into the quantifier store. (|52|) is 



parsed in the same way as (p3|), except that the 
resulting formula contains an interrogative quan- 
tifier rather than an existential one. (Our system 
ignores punctuation.) 

(52) Which engineer fixed an engine? 

(53) An engineer fixed an engine. 

"When" is treated syntactically as a temporal 
adjunct, like "on 1/6/94" and "yesterday". (p4[) 
is analysed syntactically in the same way as (p6|). 
Unlike adjuncts like "on 1/6/94", however, that 
introduce At operators, "when" introduces a ? m xie' 
(section^), where e' represents the event time. The 
Top formulae for (|54|) and (|5^) are given in ( |55"|) 
and ( |57| ) respectively. 

(54) When did IBI advertise PPC? 

(55) ?mxie' Past[e' ,advertise(ibi,ppc)] 

(56) On 1/6/94, did IBI advertise PPC? 

(57) At[l/6/9i, Past[e', advertise(ibi,ppc)}] 

5 From TOP to TSQL2 

As remarked in section ^, there are various ad- 
vantages to the traditional Nlidb architecture in 
which natural language queries are systematically 
translated into an intermediate logical language, 
then transformed into expressions in an established 
database language. 

For conventional ("snapshot") relational data- 
bases, the de facto standard query language is SQL 
[pT|| . In the newer field of temporal databases, the 
position is less clear. More than a dozen temporal 
extensions of the relational database model have 
been proposed, and there are also several proposals 
on how to modify SQL to support the notion of time 
(see, for example, @). We have chosen to adopt 
Tsql2 pTj], a temporal extension of SQL that was 
recently proposed by a group comprising most lead- 
ing temporal database researchers. We have also 
adopted the proposed conceptual database model 
for Tsql2, called Bcdm, as a formal basis for rea- 
soning about the meaning of Tsql2 expressions, 
and we have assumed that there are abstract func- 
tions which will evaluate a Tsql2 expression with 
respect to an arbitrary Bcdm database. 

A number of modifications to the Tsql2 speci- 
fication have been made during this project. These 
can be classed as follows: (i) There are minor al- 
terations to achieve uniformity or consistency in 
places where the Tsql2 definition contains some 
slight discrepancies, (ii) It could be argued that 
some extensions are desirable to improve the ex- 
pressive power of the language, regardless of natu- 
ral language issues, (iii) Some extra facilities have 
been incorporated to reflect subtleties of meaning 
which are largely motivated by the richness of the 
natural language input, and which might not be 
felt necessary in a purely database context. Never- 
theless, all such alterations have been kept to the 
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Figure 3: Proving the correctness of the translation 
from Top to TSQL2 

minimum, and the resulting version of TSQL2 is 
very close to the original language. As our system 
is a research prototype, we are confident that these 
extensions do not undermine the usefulness of the 
experiment. 

It is not feasible to provide here a complete 
account of our formal method of translation and 
its correctness, but an outline of the approach is 
possible. We assume that a Top expression and a 
Tsql2 query both refer to some universe of world 
objects, relations, etc. (much as in first-order logic), 
including temporal entities (such as intervals). The 
denotation, in terms of this universe, of a Top for- 
mula is provided by the semantic definitions which 
we have provided for the language (see ||). The 
denotation of a Tsql2 expression is given indi- 
rectly, in that we assume that there is an "eval- 
uation" function which will map a Tsql2 query to 
some entities within a Bcdm database, and that 
the semantics of the Bcdm database indicates how 
such database entities are related to the universe 
of world entities, relations, time-intervals, etc. The 
situation is roughly as in figure 0. 

There are a number of translation rules (imple- 
mented in Prolog) for converting Top expressions 
into Tsql2. These are defined recursively, in terms 
of a few basic types of Top expressions and of com- 
binations of expressions. We have proven that the 
rules are correct, in the sense that the denotation 
of a Top query (roughly speaking, its answer) as 
defined by the semantics of Top is the same as the 
denotation (answer) of the corresponding Tsql2 
query when determined by way of the "evaluation" 
function and the Bcdm database semantics (see Q 
for the complete proof). In terms of figure || above, 
the same semantic content is reached whether path 
1 or path 2 is chosen. 

6 Future directions 

As mentioned in section |^, there are several kinds 
of English temporal expressions for which we have 
not examined possible representations in Top (e.g. 
expressions referring to the future, temporal adjec- 
tives, etc.). It would be interesting to explore if 
our framework can be extended, so that questions 
containing these expressions can also be mapped 



systematically to an (extended version of) Top and 
then to TSQL2. 

A major practical limitation of our prototype 
Nlidb is that it has never been linked to an actual 
database management system (Dbms), mainly be- 
cause until recently no DBMS supported Tsql2. 
This means that the generated Tsql2 queries are 
not executed, and no answers are produced. An 
experimental Dbms, called TimeDb, that supports 
TSQL2 is now available (see ||), and it would be 
interesting to attempt to link our Nlidb to that 
Dbms. This task is complicated by the fact that 
both our framework and TimeDb use different ver- 
sions of Tsql2. 
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